Efficient 3D AlexNet Architecture for Object Recognition Using Syntactic Patterns from Medical Images

In computer vision and medical image processing, object recognition is the primary concern today. Humans require only a few milliseconds for object recognition and visual stimulation. This led to the development of a computer-specific pattern recognition method in this study for identifying objects in medical images such as brain tumors. Initially, an adaptive median filter is used to remove the noise from MRI images. Thereafter, the contrast image enhancement technique is used to improve the quality of the image. To evaluate the wireframe model, the cellular logic array processing (CLAP)-based algorithm is then applied to images. The basic patterns of three-dimensional (3D) images are then identified from the input image by scanning the whole image. The frequency of these patterns is also used for object classification. A deep neural network is then utilized for the classification of brain tumor. In the proposed model, the syntactic pattern recognition technique is used to find the feature vector and 3D AlexNet is used for brain tumor classification. To evaluate the performance of the proposed work, three benchmark brain tumor datasets are used, i.e., Figshare, Brain MRI Kaggle, and Medical MRI datasets and BraTS 2019 dataset. The comparative analyses reveal that the proposed brain tumor classification model achieves significantly better performance than the existing models.


Introduction
Object recognition is an open-ended problem in the field of computer vision and medical image processing. For any object to recognize, a human being does not need to do much since we have multiple processing systems with billions of neurons: the brain. We require just a few milliseconds of visual stimulation to recognize an object, but the computer does not work like this because the computer needs more information for object recognition. A whole image can be given as an input but that will have computation limitations. erefore, a generalized representation of different objects in one domain can give different identities to the same domain's objects.
is generalized representation of objects can be further used for the identification of different diseases such as lung cancer and brain tumors. As image processing techniques play an essential role in the diagnosis and monitoring of patients, the biomedical field has gained much attention from researchers, and many advance artificial intelligencebased techniques are available for the early detection of the disease. However, due to the unpredictable nature of cancer, it still needs more advanced techniques.
Brain cancer is a life-threatening disease, and approximately every year, 80000 new cases are reported [1,2]. e treatment of brain cancer always depends on the early and accurate detection of the tumor, and detection always depends on tumor type, type of pathology, and the time of the first investigation of the disease. Generally, brain tumors will irregularly propagate the brain cells [3,4], and according to the research, three types of brain tumors are identified: glioma, meningioma, and pituitary. Most neurologists will use CAD-based models to detect and classify brain tumors [5]. According to our literature survey, most of the researchers used segmentation-based techniques to classify brain tumors [6][7][8]. In the last decade, statistical-based classification and detection methods are satisfactory but not a hundred percent accurate [9][10][11][12][13].
erefore, in this study the image's wireframe model has been developed to detect the brain tumor from the patient's MRI. We proposed an efficient approach based on structural pattern recognition and a deep neural network because structural pattern recognition techniques can represent the patterns of simple and complex images as well. A medical image contains a lot of information, and processing these images is intricate. Hence, a wireframe model is determined and used for further processing. e wireframe model is a skeletal or edge representation of 3D objects using lines and curves [14][15][16][17]. Other basic patterns are identified in the 3D image using the syntactic pattern approach, and then, finally, these patterns and frequency of these patterns are calculated for the classification purpose. Deep learning-based algorithms play an essential role in the medical field, and according to the literature survey, the AlexNet-based architecture detects cancer in the early stage more accurately [3,[18][19][20][21][22][23][24]. erefore, the 3D AlexNet architecture is used for the classification of tumor type.
is research's primary motivation was to give a new insight into the structural pattern recognition technique because structural pattern recognition techniques can deal with complex medical images [25][26][27][28][29][30]. Keeping this into mind, we targeted our few objectives, which are as follows: (i) An efficient object recognition model is proposed using syntactic patterns and 3D AlexNet for medical images. (ii) Adaptive median filter and contrast image enhancement techniques are used to improve the quality of MRI images. (iii) To evaluate the wireframe model of images, the cellular logic array processing (CLAP)-based algorithm is utilized. (iv) Basic patterns of three-dimensional (3D) images are also identified by scanning the MRI images. e frequency of these patterns is also used for object classification.
(v) e performance of the proposed work is validated on three benchmark brain tumor datasets such as Figshare dataset, Brain MRI Kaggle, and Medical MRI datasets and BraTS 2019 dataset.
is study is further divided into four sections to achieve the objectives mentioned above. e second section covers the literature survey. e third and fourth sections cover the proposed methodology and results. e conclusion is discussed in the fifth section.

Literature Survey
In the processing of 3D image, superficial and volumetric features play an important role. e edge that intersects each other is referred to as volumetric edges, and the edges that separate two planes are called superficial edges. Lui et al. [6] proposed a gradient-based method for the detection of 3D edges. Rosenfeld et al. proposed an edge detection method using the surface magnitude. e construction of the wireframe model was first proposed by Mukherjee et al. [7].
inning and segmentation-based approach is used to evaluate the wireframe model. e adjacent segments are connected using the vertex merging method, and finally, the wireframe model is constructed.
Ren et al. [8] implemented a 3D viewpoint and shape estimation-based model to perform wireframe modeling. For the detection of the 3D viewpoints, CNN is used. Prakoonwit and Benjamin et al. [9] proposed 3D surface reconstruction and wireframe modeling by considering 2D images. In this approach, the contours of 2D images are combined and a wireframe model is created. According to the survey, most of the authors have used the viewpointbased method for the construction of the wireframe model. Much research growth is observed in the field of statistical pattern recognition-based techniques [10], but structuralbased recognition still needs researchers' attention. e structural pattern recognition technique can deal with simple and complex patterns, and these intricate patterns can be further divided into sub-patterns. Structural pattern recognition can be used for object recognition, NLP, and scene analysis [11][12][13].
Ali et al. [13] presented a method for the recognition of Bengali digits. Rashid and Ali [14] proposed a method in which eight directional codes are identified and these direction codes are used to convert the image into a feature vector, but the technique is only applicable for Bengali digits. Pal and Chaudhuri [15] have also proposed a method for the recognition of Bengali numerals. is technique was able to recognize the numerals without thinning and normalization operation. Jaydeb et al. [16] proposed the part of speech (POS) tagging and parsing technique method. is method is used to find the answers from the given text.
Hang [17] has presented research on the A * algorithm for parsing techniques and proposed an hierarchical graph method (HGM model) based on the same algorithm. is method can replace the existing virtual node method (VNM), which was used to overcome the problem of Chomsky standard forms. e performance of HGM was better than that of VNM.
For the last three decades, researchers have extensively used the machine learning approach in various areas, including medical diagnostics. ere are limited numbers of studies where researchers specially targeted brain tumor diagnostic problems. Table 1 represents the study of existing methods for brain tumor detection.
Most of the researchers used statistical pattern recognition techniques for object recognition. As per our literature survey, there is no evidence of applying the structural pattern recognition-based method in the medical field. Recently, researchers are starting using deep learning-based algorithms for the segmentation and detection of brain tumors. Hence, we used syntactic patterns for the classification of brain tumors. is is an efficient approach that helps detect and classify brain tumors in the early stage. A detailed description of the proposed methodology is discussed in the next section.

Proposed Methodology
is section summarizes the proposed methodology. Inspired by [2], the step-by-step procedure of the proposed classification model is designed (see Figure 1).

Image
Preprocessing. MRI images may contain a different type of noise that can degrade the image's quality and may not provide the required and correct information to detect the cancerous tumor. An efficient denoising and image enhancement technique is required to preserve the edges and the contour of the medical images. erefore, these techniques are directly helpful in the detection and classification of the image. In the proposed methodology, the adaptive median filter is used for image denoising purposes and an image enhancement technique is used to enhance the image.
3.1.1. Denoising of the Image. Preprocessing techniques are used to remove the noise from the images. Hence, these techniques are directly helpful in the detection and classification of the image. An adaptive median filter is used to remove the noise from the image. It compares each pixel of the image to the neighborhood pixel. If any one of the pixel values is drastically different from the neighborhood pixel, then that pixel will be considered noise and the adaptive median filter will replace that noisy pixel value with the Computational Intelligence and Neuroscience neighborhood pixel's median value. Algorithm 1 of the adaptive median filter is as follows.

Image Enhancement.
After removing the noise from the image, the next step is image enhancement. e image enhancement technique is used to improve the image's overall quality and sharpen the edges.
In the proposed method, the contrast stretching image enhancement technique is used and is shown in Algorithm 2.
is method stretches the range of intensity values, which enhances the image's quality as shown in Figure 2. First, the limit is determined over which intensity value will be extended. ese limits vary between 0 and 255 in the 8 bit grayscale image. Further, a histogram of the original image is examined to determine the limits of the original image. If the input image is already covering the full possible set of values, contrast stretching is not required. Otherwise, if input image data are within the restricted range, then starching will be applied. For each image, the original value of r is mapped with the output value O using the following function.
3.2. Wireframe of the Image. In this step, the preprocessed image is given as an input and then a wireframe model of the image is constructed using the CLAP-based model for 3D images [39][40][41]. e proposed approach works on the abrupt change in the gray level value of the neighborhood pixels. e algorithm will locate the points over which the gray level is not changing, and it works on all the planes of the 3D image. For the scanning of the 3D image, 3 × 3 × 3 size of the sliding window [42,43]
Step 4: Increase the size of the window.
Step 4a. If window_size < Imax repeat Steps 2 and 3 else the result is Xmedian.
Step 7: Stop. Computational Intelligence and Neuroscience pixels are establishing convex polyhedrons with every sliding window's movement. us, at least one pixel in every column and row provides convex polyhedrons. If such type of convex polyhedrons exists, then the central pixel will be updated by zero and the same would be repeated until it does not scan the whole 3D image using a sliding window. e performance of the algorithm is evaluated on various MRI datasets, and it is represented in Figure 3. e step-by-step flow of wireframe model is presented in Algorithm 2.

Knowledge Vector Representation of the Image.
ere are different types of shape and texture features are available in the image. e knowledge vector representation of the basic 3D patterns [44,45] gives enough information to classify an object. In this step, we will find the basic pattern of the 3D image, and these 3D patterns will give enough information to classify an object. e following methodology is taken into consideration to identify the pattern in the image syntactic pattern recognition technique.  Figure 4. Here, P 1 represents the first pattern, which could be present in the image, and it shows that all the corner pixels are present in the image. Q 1 represents that pixel 1 is not present in the 3 × 3 × 3 size of the window. In the same way, R 1,3 represents that corner pixels 1 and 3 are not present in the 3 × 3 × 3 size of the window. R 1,7 represents that corner pixels 3 and 7 are not present in the 3 × 3 × 3 size of the window. We need to find these patterns in the image and then calculate the frequency of the image. e naming convention is defined for all 256 patterns. Figure 5 shows the naming convention of the particular pixel position. Figures 5(a) and 5(b) represent 27-neighborhood structure of 3 × 3 × 3 window and co-ordinates, respectively. If we consider a 3 × 3 × 3 size of an image, then a center pixel (14) can be surrounded by 26 pixels, but in our research, we considered only corner pixels, which are supposed (1, 3, 7, 9, 19, 21, 25, and 27). Hence, the possible way in which a corner pixel could be present is as follows: (2) e possible way in which a corner pixel could be present in the image is shown in Table 2.

Run a 3 × 3 × 3 Window on an Input Image and Check for the Patterns.
e 2D slices can be combined, and it can form a 3D structure/3D volumetric data. e proposed feature extraction algorithm is written in such a way that it can work with 2D slices and 3D volumetric datasets. e proposed method takes input as a 3D volume or a sequence of 2D frames (e.g., slices in a CT scan). If it is a 2D slice and the scanning window size is 3 × 3, then a central pixel could be surrounded by eight neighborhood pixels. e naming conventions of the preferred direction are as follows: If all pixel values read by SW contain at least one pixel in every column and row then replace central pixel with zero. Otherwise, move SW to read the next pixels. } } (4) Concatenate all tracked edges that are obtained from step 3 and build a wireframe model. Computational Intelligence and Neuroscience Here, F represents front and B represents backward direction. Other conventions are the same as 2D.
e wireframe model of the image is considered as an input for this step, and the detailed explanation is given as follows.
(1) Finding the Patterns in the Image and Calculating their Frequency Using a Stride of 3. Consider a whole image and a 3 × 3 × 3 window. is window slides over the image and finds out the pattern in which this 3 × 3 × 3 window encapsulates. If any portion of the image consists of the  Computational Intelligence and Neuroscience patterns, then the frequency value of that particular pattern would be updated. To go to the next 3×3×3 section of the image, the window moves with a stride of three. is procedure is repeated until the end of the image has been reached.
e three-step stride may necessitate padding around the edges of an image in order to retrieve its information. It is to be noted that the stride of 1 is in X, Y, and Z directions. When one 3 × 3 × 3 row of an image is slid on, the next 3 × 3 × 3 row is obtained by sliding the window 3 pixels in the downward direction.
(2) Finding the Frequency of All the Patterns Obtained (3D Pattern Frequency Vector). e patterns calculated in the previous step will be counted using the vocabulary of 256 patterns, and their frequency is calculated. e frequencies of all the patterns will be represented as an array. is array will be called 3D pattern frequency vector or 3D-PFV in short. e length of this vector will be 256, which corresponds to the frequency of the patterns. e frequency of the patterns written in the array should be the same and consistent. If pattern 2 is on index 1 of the 3D-PFV for one image, then it should be on the same index for all the images. e same goes for all the patterns' frequencies. is 3D-PFV representation of the image will be a mapping between the vector and the object.
3D AlexNet input layer will be a column stacked 3D picture feature vector (3D-PFV). e hidden layer will have n neurons with n weights, and the output layer will have m neurons for m classes of objects.

Object Recognition Using 3D Pattern Frequency Vector.
e three-dimensional knowledge vector is a vector of 256 values that saves an image's overall syntactic pattern. is image can be two-or three-dimensional.
e purpose of such a vector is to get a succinct representation of a 2D or 3D image. ese vectors, therefore, hold essential information regarding the shape and morphology. erefore, theoretically, these vectors hold information about an image and thus can be used to represent it. e corresponding vector is generated using the research shown in the previous section.

Computational Intelligence and Neuroscience
For each syntactic pattern, the image is calculated and fed as an integer to the 3D AlexNet. Since there are 256 patterns, the 256-dimensional vector is created, tentatively called the X-dimensional pattern frequency vector (XDPFV), where X is 3. In the proposed methodology, 3D AlexNet is used for classification purposes. AlexNet is implemented in 2012 by Alex Krizhevsky. 3D AlexNet architecture consists of 3D filters at the convolution layer and pooling layer instead of 2D filters. In the 3D AlexNet, eight layers are present, of which 5 are convolution layers and 3 are fully connected layers. 1st convolution layer is followed by the first maxpooling layer and the same for the second convolution layer. 3rd layer is connected to consecutive positions, and then, the 5th convolution layer is connected to the 3D max-pooling layer. e output generated by the 3rd max-pooling layer will become the input to the next two fully connected layers. e third fully connected layer will now be input for the softmax classifier, and the soft-max layer consists of three class labels. e architecture of 3D AlexNet is represented in Figure 6.

Results and Discussion
In this research, we used four different brain MRI datasets. e first dataset is "Figshare brain tumor dataset," which is publicly available [33]. is dataset consists of a total of 3064 images of 233 different patients. e second dataset is again "Brain MRI dataset," which is publicly available on Kaggle [35].
is dataset consists of a total of 253 MRI images. e third dataset is the Medical MRI dataset, and it is collected from the Pentagram Research Institute, Hyderabad. e dataset contains 2D slices of MRI, which can be combined to make a 3D view of the images. e fourth dataset is BraTS 2019 dataset [48].
is dataset consists of the images of four different contrasts. e dataset contains four folders. e folder structure is composed of fluid-attenuated inversion recovery (FLAIR), T1, T1 contrast-enhanced, and T2.
To build a multi-orientation or view of each 3D model, we rotated the dataset equally in a horizontal direction and finally produced the dataset on 4th and 7th depths and multi-orientation (3,6,9,12,18,24, and 30 rotations) occurs only on depth 7th. Each 3D model's multi-orientation is applied to both the training and testing datasets individually. To train the deep learning model, Keras with TensorFlow libraries has been used. e specification of the system is Windows Operating System, GPU Processor, and 32 GB RAM. We created a GUI of the proposed system for better analysis purposes, which takes MRI images as input. GUI of the proposed methodology is shown in Figure 7.
During the brain tumor classification, 3D AlexNet architecture is used, which requires values of different hyper- parameters to be set. is is required to obtain the optimum performance of the architecture. ese hyper-parameters are epochs, learning rate, dropout, and batch size. e optimized values of these hyper-parameters for each problem are mentioned in Table 3. Learning rate and early stopping depend on the validation loss for every 5 to 10 epochs. In this implementation, the batch size is considered 128. e epoch is 65, and steps per each epoch are 4. e early stopping property has been introduced in the training process, which stops the learning process at certain epochs; in case, the accuracy and loss are constant for continuous four epochs. e validation split is set to 0.1. is parameter allows the model to handle train/test data splitting on its own. 0.1 splits the data by splitting the sample dataset into 90% of training and 10% to test the samples. is allows plotting the loss and accuracy values correctly while plotting them. e dropout technique used to ignore a random set of neurons during the training is given as 0.20. e dropout set to 20% means one in 5 inputs is excluded from each cycle. e learning rate ranges from 0 to 1. e learning rate that is too high can cause the model to converge too swiftly to a suboptimal solution. If the learning rate is too low, it causes      the model to get stuck between the training processes. Simultaneously, a learning rate that is too small can cause the process to get stuck. In the proposed method, we have taken the learning rate as 0.001 so that the network learns slowly to perform object recognition on divergent input images.

Evaluation Parameters.
e proposed methodology's performance is evaluated by the computation of accuracy, sensitivity, specificity, and F1 score defined in Table 4 [47,49]. If accuracy, sensitivity, and specificity have a higher value, it represents the algorithm's better performance.

Result
Analysis. MRI images were selected to evaluate the performance of the proposed system. First, a comparative analysis of the proposed wireframe model with various classical edge detection algorithms has been done, and for the comparison purpose, the Sobel, Robert, and Prewitt algorithms are taken into consideration. According to the result analysis, it is observed that the Robert and Sobel edge detection method could not produce the closed contour, leading to false tumor detection. e proposed wireframe model can generate a closed structure of the tumor and boundary of the tumor is sharp and visibility is more. Table 5 shows the subjective quality evaluation among various methods. As shown in Table 5, the proposed wireframe model is able to find the contour/wireframe model for the small object also, which is not possible in the case of Robert, Sobel, and Prewitt. As per the analysis, the proposed CLAP-based wireframe model performs better than existing methods.

Quantitative Analysis.
e results obtained from the proposed wireframe model are compared with the existing edge detection method on the test images shown in Table 6. e performance measures are accuracy, sensitivity, and specificity. e accuracy of the proposed wireframe model is 99%, the sensitivity is 87%, and the specificity of 1 shows the proposed wireframe model's effectiveness for detecting the edges from the given images. e detailed statistics measures are shown in Figures 8-10.

Result of Classification on Different Datasets.
is section represents the comparative analysis of the classification results of the proposed method with the existing methods over the same datasets. For the classification of different types of tumors, different hyper-parameters were used for the training of the proposed model. e obtained results are evaluated using different quantitative measures, which are already mentioned in Table 6.
e performance of the proposed method is evaluated on the Figshare dataset [33], Brain MRI Kaggle dataset [35], and Medical MRI dataset and BraTS 2019 dataset. e evaluation parameters are accuracy, sensitivity, specificity, precision, recall, F1 score, and mean average precision (mAP). Table 7 shows the comparative analysis of the classification results obtained by existing methods on the Figshare dataset [33]. e highest accuracy is obtained using the fine-tune VGG-16 [61] model. e authors have used fine-tuning transfer learning, which is used to improve the efficiency of the architecture. Figure 9 shows the graphical representation of the proposed and existing method results on the Figshare dataset.
Some authors have used the RCNN-based method for the classification of the brain tumor, and Table 8 shows the comparative analysis of the existing method on the Figshare dataset.
e evaluation parameters are accuracy, mAP, sensitivity, and time. As per the comparative study, the proposed methods give better results in terms of accuracy and time on the Figshare dataset. Figure 10 shows the graphical representation of the proposed and existing method results on the Figshare dataset.
For a better evaluation of the proposed method, we performed the experiments on two other datasets. e accuracy of the proposed method on the brain MRI dataset is 99.17%. As per the comparative study, mask RCNN performs better with an accuracy of 98.34% , but our proposed system has 1.14% higher accuracy rate than the existing system. Some other evaluation parameters are used for the validation of the proposed work. e result of the comparative analysis is shown in Table 9. e class-wise accuracy of the Figshare dataset is 98.94%, 99.18%, and 99.02% on glioma, meningioma, and pituitary classes. e accuracy of the proposed method is also evaluated on the Medical MRI dataset, and it is 99.23%. e results of the comparative analysis are shown in Table 10.
e performance of the proposed work was evaluated on the BraTS 2019 dataset. Table 11 shows the average accuracy of the proposed method on the BraTS 2019 dataset. is section compares the proposed method with seven other existing methods, and all the researchers worked on two-class labels: LGG and HGG tumors. Zhugeet et al. [70] used deep CNN for the classification of tumor using T1, T2, and T2-FLAIR images and achieved the impressive accuracy of 97.1%. e proposed method used 3D AlexNet for the classification of tumor using T1, T1ce, T2, and T2-FLAIR images and achieved the accuracy of 96.91%, which is 0.19% lesser than the existing method [70]. Table 12 shows the class-wise performance of the various method using precision, specificity, recall, and F1 score. For low-grade glioma, the proposed 3D AlexNet achieved the highest precision of 0.925. is value is 0.062% higher than the existing pretrained ResNet mixed convolution method. For high-grade glioma and healthy subject class, the 3D AlexNet method achieved the highest precision of 0.959 and 0.998, which are higher than the existing ResNet mixed convolution method. In the same way, the performance of the proposed method in terms of recall for LGG and HGG classes is better than the existing one, but in the case of healthy subjects the recall value of the proposed method is 0.956, which is 0.039 less than the pretrained ResNet mixed convolution method. As with LGG, HGG, Computational Intelligence and Neuroscience  and healthy subjects, the best model is the exiting ResNet mixed convolution with the specificity of 0.931, 0.959, and 0.999. e F1 score value of the proposed method for LGG and HGG classes is better than the existing method, but in the classification of healthy subjects pretrained ResNet 3D achieved the highest F1 score value.   Cheng et al. [36] Dheng et al. [37] Abir et al. [38] Afshar et al. [40] Pashaei et al. [43] Deepak et al. [44] Swati et al. [45] Huang et al. [46] Guamei et al. [47] BrainMRINet et al. [34] Arisha Rehman et al. [35] Arisha Rehman et al. [35] Arisha Rehman et al. [35] Proposed Methodology Abiwinanda et al. [41] Ismael and Qader [42] WIdhiarso et al. [39] [58] GoogleNet and SVM 97.10 Swati et al. [34] VGG-19 94.82 Huang et al. [59] CNN based on complex networks 95.49 Gumaei et al. [60] GIST descriptor and ELM 94.93 Chakrabarty [35] Attention module, hyper column technique, residual block 97.69 Arisha Rehman et al. [61] Fine-tune AlexNet 97.39 Arisha Rehman et al. [61] Fine-tune GoogleNet 98.04 Arisha Rehman et al. [61] Fine-tune VGG-16 98.69 Proposed methodology 3D AlexNet 99.04

Conclusion and Future Work
In this research, a CLAP-based algorithm was used for the detection of the wireframe model. e algorithm took various input images, and their corresponding wireframe model was computed. e wireframe model finds the close surface of the given tumor image and also provides the 3D visual representation of the images. e qualitative and quantitative comparisons were performed on various edge detection algorithms. e accuracy of the CLAP-based wireframe model was found to be 99%, which is better than the majority of the existing models. Further, tumor detection and classification-based system were developed using a wireframe model and syntactic pattern recognition approach. Experiment analysis reveals that the proposed lightweight brain tumor classification model has achieved comparatively better performance than the existing state-ofart methods. Furthermore, the proposed system could combine with the existing cancer diagnostic tools to improve the robustness of the existing CAD-based systems.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors would like to confirm that there are no conflicts of interest regarding the study.